WWW Search Systems Using SQL*TextRetrieval and Parallel Server for Structured and Unstructured Data

نویسندگان

  • Gang Cheng
  • Piotr Sokolowski
چکیده

We describe our experience in developing Web Search Systems using Oracle’s SQL*TextRetrieval. In the prototype system we store on-line books in the HTML and the HTML documents of a web site, SQL*TextRetrieval is used to index full text and other structured data in the ’web space’ and to provide an efficient search engine for free-text search. The Web enables global access to and maximum information sharing with a hypertext-based text retrieval system. Using Oracle’s Free Text Retrieval technology various search options are implemented, including basic word stemming, phrase, fuzzy, and soundex searching, as well as more advanced proximity search and concept search. For the concept search option, we have integrated a public domain "Roget Thesaurus" into our text search system to support synonym expansions. An advanced search mechanism to recursively refine search domain via the web is also described. The prototype system can be found at URL . A full production system will be implemented on a multiprocessor parallel machine where parallel Oracle 7 with parallel server an query options are used.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

On the Integration of Structured Data and Text: A Review of the SIRE Architecture (invited talk)

1.0 Introduction Over the past decade, members of the Information Retrieval Lab have designed, developed, and deployed a variety of information retrieval systems. A central theme for all of our systems was the integration of structured data and text. One of our more recent efforts, SIRE, a Scalable Information Retrieval Engine [Grossman97, Grossman98, Lundquist99] is the focus of this paper. Fo...

متن کامل

Implementing XML Schema inside a relational database

XML Schema has emerged as a promising data model that unites structured and unstructured content. The Oracle database has led the commercial database community in integrating support for XML Schema inside an enterprise data server. The foundation for this was laid with the absorption of the SQL:1999 'objectrelational' type system in the database, which provided the necessary hierarchical abstra...

متن کامل

An effective and versatile keyword search engine on heterogenous data sources

We present EASE, an effective and versatile keyword search engine that enables users to easily access the heterogenous data composed of unstructured, semi-structured and structured data, without the need of learning XPath/XQuery or SQL languages. EASE addresses a challenge in keyword search that has been neglected in the literature: how to efficiently and adaptively process keyword queries on t...

متن کامل

DOW: An Extended Database Browser over the WWW

DOW is an extended World Wide Web interface for database systems. In addition to WWW browsing and navigation, DOW supports HTML document management as well-structured data objects in an ODBMS. We can access stored objects and adapted HTML documents using SQL, keywords, and schema browser. DOW has four main features. First, to manage HTML documents within a database, we introduced an HTML adapte...

متن کامل

Accessing Databases Through the World Wide Web: Issues and Current Practice

This paper examines the issues, current practices and trends for the future regarding accessing traditional databases through the World Wide Web (WWW). It is often easy to forget that the WWW has no data management facilities of its own; it is simply a “pointing” mechanism. Databases have historically focused on rich toolsets for describing, updating, and modelling data. WWW adds to this a plat...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000